Introduction

This lab will explore the spatial distribution of climate damages across the US and the relationship between climate damages and areas with a high percentage of disadvantaged residents.

To run this lab, open a new, empty R script file.

Read through a section of the lab document and use CTRL-C then CTRL-V to copy a gray section of the lab code into your R file.

Examine the each block of code and try to understand what each statement does.

Sometimes it helps to first execute the code, view the results, then examine the code that produced the results.

Task 1: Setup the lab

Read the CEJST PowerPoint to remind yourself how the different individual disadvantaged categories are calculate, and how the overall category is calculated. Scan the CEJST website at

https://screeningtool.geoplatform.gov

CEJST’s basic spatial unit is the 2010 census tract. Tracts are largish neighborhoods averaging about 4,000 people. They are relatively stable over time. They are nested within counties in the same way the counties are nested within states.

Tracts are identified by an 11-character geoid that consists of fips codes for state (2 characters), county (3 characters) and tract (6 characters). This means the first two characters of the code identify the tract’s state, and the first five characters of the code identify the tract’s county. This make it very easy for R programmers to group_by() and summarize() tract-level data to states or counties.

Import the CEJEST dataset with the code below and spend some time browsing the extensive list of variables. Remember: all this data is now available to you at the tract (neighborhood) level!

Copy the code below into your R file, but skip the three lines that begin with “knitr”. Execute the code and examine the output R dataframes.

knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(message = FALSE)
knitr::opts_chunk$set(warnings = FALSE)
library(tidyverse)
library(readxl)
library(sf)
library(RColorBrewer)
library(tmap)
library(plotly)

setwd("c:/temp/class2322")

source("functions.R")

# Task 1: Lab setup for CEJST data

# read national census tract CEJST data
j1 = read_excel("communities-2022-05-31-1915GMT.xlsx")

# create county geocode from first 5 characters of tract id
#   total population, and disadvantaged population 
j2 = j1 %>% 
  tolow() %>%  
  mutate(geocode = paste0("g", mid(census.tract.id,1,5))) %>% 
  select(geocode,census.tract.id,everything()) %>% 
  mutate(totpop = total.population,
         dispop = ifelse(identified.as.disadvantaged=="TRUE",
                         total.population,0))

# add population and disadvantaged population by county
# calculate disadv as true when over 50% of the population
# is disadvantaged
j3 = j2 %>% 
  group_by(geocode) %>% 
  summarize(totpop=sum(totpop,na.rm=T),
            dispop = sum(dispop,na.rm=T)) %>% 
  mutate(dispct = 100*dispop/totpop) %>% 
  mutate(disadv = ifelse(dispct > 50,TRUE,FALSE))

The j1, j2, and j3 files are the result of reading a spreadsheet of attribute data. There is no spatial data in those dataframes.

The CEJST developers have also released a giant(!) shapefile that contains the same attributes plus US 2010 census tract boundaries for all 74,000 US tracts. R seems to read the file effectively, but may hang if you try to draw all the US tracts using an R package like tmap.

The command below reads a spatial dataframe that has been filtered for Georgia tracts only. Unfortunately, the column names are brief and cryptic. Fortunately the developers have also given us a variable documentation file named “columns_csv” that nas explains what each variable is. Open columns_csv in a spreadsheet and scan through it.

gacejst = readRDS("gacejst.rds") %>% 
  # the two commands starting with "disadv =" are not a mistake.
  # the first sets every tract to 0, even those with missing values.
  # the second sets disadvantaged tracts to 1.
  # the variable sm_c is a binary 1/0 variable representing whether
  #    or not the tract is disadvantaged.
  mutate(disadv = 0,
         disadv = ifelse(sm_c == 1, 1, disadv),
         geocode = paste0("g",geoid10)) %>% 
  select(geocode,disadv,everything())

The following tmap code draws Georgia cejst tracts in an interactive map.

Zoom to the Georgia Tech area and identify disadvantaged tracts near campus.

tmap_mode("view")

tm_basemap("OpenStreetMap") +
tm_shape(gacejst) +
  tm_polygons("disadv",alpha=0.25)

Task 2: Import and setup of Hsiang climate impact data

Read or scan the Hsiang PowerPoint, and scan through the Science article included in the class folder.

Run the following code to import and clean up the Hsiang damages dataset.

# read county damages by sector file, replace missing values with 0
d1 = read_excel("hsiang-county_damages_by_sector.xlsx") %>% 
  tolow()

# rename variables, convert coastal damages from log to normal units

d2 = d1 %>%  mutate(geocode = paste0("g",mid(as.character(county.fips.code+100000),2,5))) %>% 
  rename(stabbr = state.code,
         cname = county.name,
         cpop12 = county.population..in.2012.,
         cinc12 = county.income..in.2012.,
         agric = agricultural.damage......4.major.crops.,
         mortal = mortality..deaths.per.100k.,
         energy = energy.expenditures....,
         lowrisklab = labor.low.risk....,
         highrisklab = labor.high.risk....,
         coast = coastal.damage..log10...county.income..,
         vcrime = violent.crime....,
         pcrime = property.crime....,
         total = total.damages....county.income.) %>%
  mutate(coast = 10**coast) %>% 
  replace(., is.na(.), 0) %>% 
  select(geocode,everything(),-county.fips.code)

Task 3: Read state and county GIS files

Read the state and county Census cartographic boundary files as R spatial dataframes.

At the end of this code block you will join the CEJST and Hsiang datasets to the county spatial dataframe, so all the county-level variables in those datasets will now be mappable by tmap.

# read state boundaries shapefile
s1 = st_read("cb_2018_us_state_20m.shp")
## Reading layer `cb_2018_us_state_20m' from data source 
##   `C:\temp\class2322\cb_2018_us_state_20m.shp' using driver `ESRI Shapefile'
## Simple feature collection with 52 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
## Geodetic CRS:  NAD83
# read county boundaries shapefile, create geocode for joining,
#    remove Alaska, Hawaii and non-states
c1 = st_read("cb_2018_us_county_20m.shp") %>% 
  tolow() %>% 
  mutate(geocode = paste0("g",geoid)) %>% 
  filter(statefp != "02" & statefp != "15") %>% 
  filter(statefp <= "56")
## Reading layer `cb_2018_us_county_20m' from data source 
##   `C:\temp\class2322\cb_2018_us_county_20m.shp' using driver `ESRI Shapefile'
## Simple feature collection with 3220 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
## Geodetic CRS:  NAD83
# join damages file and CEJEST file by geocode
c2 = c1 %>% 
  left_join(d2) %>% 
  left_join(j3) %>% 
# create county name with state abbreviation  
  mutate(cname2 = paste0(cname,", ",stabbr)) %>% 
# round values for better display on graphs and maps
  mutate(dispct = round(dispct,1),
         total = round(total,1)) %>% 
  select(cname2,everything())

Task 4: Comparing total damages for disadvantaged vs. non-disadvantaged counties

To compare damages in disadvantaged vs. non-disadvantaged counties we could just calculate the mean values for counties in each group. However that would treat large population and small population counties as equal contributors to the means. Instead we should use a weighted mean, which calculates a mean using weights of each county’s total population.

Remember total damages are a percent of each county’s GDP.

What do the results show?

Do disadvantaged counties suffer higher or lower losses of GDP, compared to the other counties.

damagesmean = c2 %>% 
  group_by(disadv) %>% 
  summarize(average = weighted.mean(total,totpop,na.rm=T))
damagesmean

We’ll now create a scatter plot of the relationship between percent of disadvantaged population (on x axis) vs. total damages as a percent of county GDP.

Remember, putting ggplotly() around a ggplot command creates an interactive version of the chart.

Note: the geom_smooth with method=“lm” command fits a regression line to the points.

Question: what does it mean when the line slopes upward?

After you create the chart, spend 5 minutes or so using the mouse to explore some of the extreme points in the chart.

ggplotly(
ggplot(c2) +
  geom_point(aes(x=dispct,y=total,
                 col=disadv,label=cname2)) +
  scale_color_manual(values=c("deepskyblue1","chocolate2")) +
  geom_smooth(aes(x=dispct,y=total),method="lm") +
  xlab("County Percent Disadvantaged") +
  ylab("Climate Damages as Percent of County GDP") +
  ggtitle("Are Expected Climate Change Damages Higher
          in Counties with over 50% Disadvantaged Population?"))
## Warning: Ignoring unknown aesthetics: label
## Warning: Removed 1 rows containing non-finite values (stat_smooth).

The ggplot below produces a variation of the prior map by coloring Georgia counties in red. You can think of the regression line as showing the average damage level for counties with different percentages of disadvantaged population.

What does it mean that most Georgia counties are above the regression line?

ggplotly(
ggplot() +
  geom_point(data=c2,aes(x=dispct,y=total,
                 col=disadv,label=cname2)) +
  scale_color_manual(values=c("deepskyblue1","yellow")) +
  geom_smooth(data=c2, aes(x=dispct,y=total),method="lm") +
  geom_point(data=(c2 %>% filter(statefp=="13")),
             aes(x=dispct,y=total,
                 label=cname2), col="red") +
  xlab("County Percent Disadvantaged") +
  ylab("Climate Damages as Percent of County GDP") +
  ggtitle("Are Expected Climate Change Damages Higher
          in Counties with over 50% Disadvantaged Population?"))
## Warning: Ignoring unknown aesthetics: label

## Warning: Ignoring unknown aesthetics: label
## Warning: Removed 1 rows containing non-finite values (stat_smooth).

Task 5: Creating maps of disadvantaged counties and climate damages

Create maps showing the spatial distribution of disadvantaged counties and total climate damages.

tmap_mode("view")

# create a map of disadvantaged areas
tm_shape(c2) +
  tm_polygons("disadv", palette="Reds") +
  tm_shape(s1) +
  tm_borders("black",lwd=1.5)
# create a map showing total damages (as percent of county GDP)
tm_shape(c2) +
  tm_polygons("total", palette="-RdYlBu") +
tm_shape(s1) +
  tm_borders("black",lwd=1.5)

Question: what does it mean when a county’s total damages number is negative?

Starting with the code from the total damages map, create maps showing agricultural damages, mortality damages, and coastal damages (from hurricanes). Use a different palette from RColorBrewer for each map. Use google or duckduckgo to search for an RColorBrewer image showing all the RColorBrewer palettes.


Task 7: Black population and climate damages

The next line of code loads an American Community Survey dataset that includes total population and black population by county, and a black percentage variable.

b1 = readRDS("usacs20.rds")

Write code to create a new county dataset named c3 by starting with the existing county dataset c2 and conducting a left_join to add the b1 variables. See above in Task 3 for similar joins.

Write code to calculate the weighted mean of black percent population comparing disadvantaged to non-disadvantaged counties. See above Task 4 for an example of similar code.

Write code to create a ggplot scatterplot with black percentage on the x axis and total damages on the y axis. See above in Task 4 for an example of similar code. Execute a google or duckduckgo search for “colors in R Ying Wei.” Use this colors cheat-sheet to select your colors for the scatterplot.